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CONFIDENCE BANDS IN NONPARAMETRIC TIME SERIES 

REGRESSION 



■ By Zhibiao Zhao and Wei Biao Wu 1 

Pennsylvania State University and University of Chicago 

^ , We consider nonparametric estimation of mean regression and 

■ conditional variance (or volatility) functions in nonlinear stochastic 
f regression models. Simultaneous confidence bands are constructed 

and the coverage probabilities are shown to be asymptotically correct. 

1 | ■ The imposed dependence structure allows applications in many linear 
£-H | and nonlinear auto-regressive processes. The results are applied to the 
C/3 '. S&P 500 Index data. 

"£3 I 1. Introduction. There are two popular approaches in time series analy- 

sis: parametric and nonparametric methods. In the literature various para- 
i— 1| metric models have been proposed, including the classical ARMA, threshold 

AR (TAR: Tong [32]), exponential AR (EAR: Haggan and Ozaki [21]) and 
j> ■ AR with conditional heteroscedasticity (ARCH: Engle [10]) among others. 

<^ \ Those models are widely used in practice. An attractive feature of paramet- 

ric models is that they can provide explanatory insights into the dynamical 
. characteristics of the underlying data-generating mechanism. 

\ However, a parametric model has good performance only when it is indeed 

■ the true model or a good approximation of it. Thus, for parametric models, 
modelling bias may arise and there is a risk of mis-specification that can lead 
to misunderstanding of the truth and wrong conclusions. One traditional 
approach is to consider a larger family of parametric models, hoping that 

^ , bigger models could better approximate the true one. Another appealing 

<-h ' way out is to use nonparametric techniques which let the data "speak for 

themselves" by imposing no specific structures on the underlying regression 
functions other than smoothness assumptions. See Fan and Yao [17] for an 
extensive exposition of nonparametric time series analysis. 
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Nonpar ametric estimates can be used to test parametric assumptions. The 
problem of nonparametric model validation under dependence is important 
but difficult. Fan and Yao [17] dealt with this deep problem for time series 
data by using the idea of generalized likelihood ratio test (Fan, Zhang and 
Zhang [18]), which is developed for independent data. Fan and Yao [17] 
pointed out that there have been virtually no theoretical developments on 
nonparametric model validations under dependence, despite the importance 
of the latter problem since dependence is an intrinsic characteristic in time 
series. 

Here we consider the model validation problem for the stochastic regres- 
sion model 

(1.1) Y i = n(X i )+a(X i )e i , i = l,2,...,n, 

where £j are independent and identically distributed (i.i.d.) random noises 
and (Xi,Yi) are observations. The functions fi(-) and cr(-) are mean regres- 
sion and conditional variance (or volatility) functions, respectively. As a spe- 
cial case, let Xi = Yi-\. Then (1.1) is reduced to the nonlinear autoregressive 
process Y{ = //(li_i) + cr(Yj_i)ej and it includes many parametric time se- 
ries models. For example, if fi(x) = ax, or /i(x) = amax(x,0) + 6min(a;,0), 
or n(x) = [a + 6exp(— cx 2 )]x, where a, b, c are real parameters, then it be- 
comes AR, TAR, EAR processes, respectively. For ARCH processes, = 
and a(x) = (a 2 + b 2 x 2 ) 1/2 . 

In (1.1), if we let Y{ = Xi + \ — Xi and e% be i.i.d. standard normals, then 

(1.1) can be viewed as a discretized version of the stochastic diffusion model 

(1.2) dX t = n(X t )dt + a(X t )dW t , 

where {Wt} is a standard Brownian motion. Many well-known financial 
models are special cases of (1.2); see Fan [13] and references therein. By 
model (1.1) we assume that we observe data on an increasing time span n — ► 
oo with fixed time duration between two consecutive observations. Another 
approach is sampling from (1.2) on a fixed time span, say [0,1], with an 
increasing sampling frequency. These two frameworks have different ranges 
of applicability. In many applications, we add today's current S&P 500 Index 
at the end of the existing sample, not the finer observations between two 
existing ones. In such cases, the former one could be a better choice. See 
A'it-Sahalia [1] and Bandi and Phillips [3] for more discussions. 

We shall address the model validation problem of (1.1) by constructing 
nonparametric simultaneous confidence bands (SCB) for [i and a. SCB are 
useful in testing whether [i and a are of certain parametric forms. For ex- 
ample, in model (1.1), interesting problems include testing whether ^ is 
linear, quadratic, or of some other pattern and whether a is nonconstant, 
that is, the existence of conditional heteroscedasticity. The mean regression 
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function \i can be nonparametrically estimated by kernel, local linear, spline 
and wavelet methods. To construct asymptotic SCB for fi{x) over the inter- 
val x G T = [Ti,T2] with level 100(1 — a)%, a S (0, 1), we need to find two 
functions /(•) = l n (-) and u(-) = u n (-) based on the data (Xi,Yi)f =1 such that 

(1.3) lim F{l(x) < u(x) < u(x) for all x G T\ = 1 — a. 

n — >oo 

It is certainly more desirable to have (1.3) in a nonasymptotic sense, namely 
the probability in (1.3) is exactly 1 — a. However the latter problem is in- 
tractable since it is difficult to establish a finite sample distributional theory 
for nonparametric regression estimates. With the SCB, we can test whether 
fi is of certain parametric form: H$ : fi = fig, where 9 G O and is a param- 
eter space. For example, to test whether fi(x) = (3$ + P±x, we can apply the 
linear regression method and obtain an estimate (f%, f}\) of {(3q,(5i) from the 
data (Xi,Yi)f =1 , and then check whether l(x) < (3q + f3\x < u(x) holds for all 
i6T. If so, then we accept at level a the null hypothesis that fi is linear. 
Otherwise Hq is rejected. 

The construction of SCB I and u satisfying (1.3) has been a difficult 
problem if dependence is present. Assuming that (Aj,Yj) are independent 
random samples from a bivariate population, Johnston [26] obtained an 
asymptotic distributional theory for sup 0<x<1 \fi{x) — E[/i(x)]|, where (x(x) 
is the Nadaraya-Watson estimate of the mean regression function fi(x) = 
E(y|X = x). Johnston applied his limit theorem and constructed asymp- 
totic SCB for fi. Since his result is no longer valid if dependence is present, 
Johnston's procedure is not applicable in the time series setting. A key tool 
in Johnston's approach is Bickel and Rosenblatt's [4] asymptotic theory for 
maximal deviations of kernel density estimators. Bickel and Rosenblatt ap- 
plied a deep result in probability theory, strong approximation, which asserts 
that normalized empirical processes of independent random variables can be 
approximated by Brownian bridges. Such a result generally does not exist 
under dependence. For other contributions under the independence assump- 
tion see Hardle [24], Knafl, Sacks and Ylvisaker [27], Hall and Titterington 
[23], Hardle and Marron [25], Eubank and Speckman [11], Sun and Loader 
[31], Xia [38], Cummins, Filloon and Nychka [7] and Diimbgen [9] among 
others. 

In the fixed design case with Xi =i/n, Robinson [30] obtained a central 
limit theorem for nonparametric estimates of trend functions for dependent 
data. By applying Komlos et al.'s [28] strong invariance principle for partial 
sums, Eubank and Speckman [11] constructed SCB for fi with asymptoti- 
cally correct coverage probabilities. Their method was extended to the time 
series setting by Wu and Zhao [37]. However, Wu and Zhao's result is not 
applicable here since it heavily relies on the fixed design assumption. 

In this paper, we shall consider a variant of (1.3) and construct SCB over 
a subset T n of T with T n becoming denser as n — > oo. A similar framework 
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is also adopted in Biihlmann [5] and Knafl, Sacks and Ylvisaker [27] in 
other contexts. The latter paper concerns the construction of conservative 
SCB of regression functions for independent data with Gaussian errors over 
grid points. It is shown that our SCB has asymptotically correct coverage 
probabilities under a general dependence structure on (Xi,Yi) which allows 
applications in many linear and nonlinear processes. Our method can be 
applied to statistical inference problems in time series including goodness- 
of-fit, hypothesis testing and others. In the development of our asymptotic 
theory, we apply a deep martingale moderate deviation principle by Grama 
and Haeusler [19]. 

We now introduce some notation. For a random variable Z write Z € 
C p ,p > 0, if \\Z\\ P := \E(\Z\ P )]^ P < oo, and \\Z\\ = \\Z\\ 2 . For a,b£R let 
aAb = min(a, &), a V b = max(a, b) and [a] = infjfc £Z:k> a}. Let {a n } and 
{b n } be two real sequences. We write a n >c b n if \&n/bn\ is bounded away from 
and oo for all large n. For 5cR denote by C P {S) = {<?(•) : sup xg5 \g^ k \x)\ < 
oo, k = 0, . . . ,p} the set of functions having bounded derivatives on S up 
to order p > 1, and by C°(S) the set of continuous functions on S. Let 
S e = \Jy£s{ x '■ \ x ~ u\ — e l be the e-neighborhood of S, e > 0. 

The rest of the paper is structured as follows: We introduce our depen- 
dence structure on (Xj,Yj,£j) in Section 2. Section 3 presents the main 
results: SCB for /i(-) and er 2 (-) with asymptotically correct coverage prob- 
abilities are constructed in Sections 3.1 and 3.2, respectively. In Section 4, 
applications are made to two important cases of (1.1): nonlinear time series 
and linear processes, where we consider both short- and long-range depen- 
dent processes. In Section 5, we discuss some implementation issues, and 
then perform a simulation study. Section 6 contains an application to the 
S&P 500 Index data. We defer the proofs to Section 7. 

2. Dependence structure. In (1.1), assume that e%,i € Z, are i.i.d. and 

that Xi is a stationary process 



Here rji,i £ Z, are i.i.d. and G is a measurable function such that Xi is 
well defined. The framework (2.1) is very general (Priestley [29], Tong [32], 
Wu [35]). Assume that gj is independent of Ti and rji is independent of 
£j,j <i-2. 

Define the projection operator Vk, k £ Z, by V^Z = E(Z| T^) — E(Z|^._ 1 ), 
Z G C . Let Fx and F £ be the distribution functions of Xq and eq, respec- 
tively; let fx = F' x and f e = F' £ be their densities. Let Fx{x\Ti) = PpQ+i < 
x\Ti), i S Z, be the conditional distribution function of Xi + \ given Ti and 
fx(x\Ti) = dFxix^i) / dx the conditional density. Let 



(2.1) 



Xi = G{Fi) 



where Ti = (. . . .Jft-i,^). 



(2.2) 



Oi = sup \\V f x {x\Fi)\\ + sup \\Vofx(x\Fi)\\ 
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where f' x (x\J r i) = d f x^Fi) / dx if it exists. For n G N define 

oo n 

(2.3) E n = nQ 2 2n +J2{®n + k-&k) 2 where n = J>. 

k=n i=l 

Roughly speaking, 6i measures the contribution of Eq in predicting Xi+\ 
(Wu [35]). If Boo < oo, then the cumulative contribution of eo in predicting 
future values is finite, thus implying short-range dependence (SRD). In this 
case H n = 0(n). Our setting also allows long-range dependence (LRD). For 
example, let 9i = i~P£(i), where (3 > 1/2 and £(■) is a slowly varying function, 
namely linx^^ £(\x) / '£{x) = 1 for all A > 0. Note that £(n) = £" =1 1^(01/* 
is also a slowly varying function. By Karamata's theorem, 

(2.4) E n = 0(n), 0[n 3 - 2l3 £ 2 {n)} or 0{n[£{n)} 2 }, 

under (3 > 1 (SRD case), f3 < 1 (LRD case) or = 1, respectively (see Wu 
[34]). In the LRD case H n grows faster than n. In Section 4 we shall give 
bounds on zi n for SRD and LRD linear processes and some nonlinear time 
scries. 



3. Main results. Let T = [7\, Tj\ be a bounded interval. We assume here- 
after without loss of generality (WLOG) that E(eo) = and E(eg) = 1 since 
otherwise model (1.1) can be re-parameterized by letting Jx(x) = fj,(x) + 
a(x)E(eo), a(x) = ca{x) and = [e, -E(ej)]/c, where c 2 =E(s§) - [E(e )] 2 - 

3.1. Simultaneous confidence band for \i. There exists a vast literature 
on nonparametric estimation of the regression function [i. Here we use the 
Nadaraya-Watson estimator 



(3.1) 



nb n fx{x) i=1 



1 n 

where fx (x) = -j- V i^6 n (z - 



Here and hereafter Kf, n (n) = K (u/b n ) , if is a kernel function with J K K(it) citi = 
1 and the bandwidth 6 n — ► satisfies n6 n , — > oo. One can also use the local 
linear estimator in Fan and Gijbels [15] and derive similar results. In this 
paper we have decided to use the local constant estimator (3.1) instead of 
the local linear estimator since the latter involves more tedious theoretical 
derivations. In Definition 1 below, some regularity conditions on K are im- 
posed. Theorem 1 asserts a central limit theorem (CLT) for /^(x), which 
can be used to construct point- wise confidence intervals for n(x). 
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Definition 1. Let K, be the set of kernels which are bounded, symmet- 
ric, and have bounded derivative and bounded support. Let ijjjc = J K u 2 K(u) du/2 
and (fK = / R i£" 2 (u) du. 

Theorem 1. Let x G R be fixed and K G fC. Assume that fx(x) > 
0,a(x) > and fx^H ({x} e ) for some e > 0. Further assume that 



(3.2) 



1 



nbl + — + E n + ^ ^0 



n6 n . 



1 



n n z 



Let /^(a:) = p"(x) + 2re' \x)f' x (x)/ fx(x) ■ Then as oo, 

—^^\[Jx{x)[h n {x) ~ - bli/jKp^x)} => N(0, 1). 

Theorem 2. Lei e G £ 3 , T = [Ti,T 2 ] and K e K. Assume that 
mt xer fx(x) > 0,M xe ro-(x) > 0, and fx, (J, G C 4 (T e ),cr eC 2 (T £ ) /or some 
e > 0. Further assume 



(3.3) 



(log re) 3 „ 

n6;logreH ha n 

reOS 



6^ log re ^ (logn) 2 



" re 2 6n /3 
Let Pfi(x) be as in Theorem 1. For n > 2 define 



0. 



(3.4) B n (z) = v / 21oi 



1 



n 



\/2 log re 



1 



log logn + log(2\/7r) 



+ 



v / 2logre 



Let t/ie kernel K have support [— ko,ko\; let T n = {xj = T\ + 2kQb n j,j 
0, 1, . . . ,m n — 1} and m n = [(T2 — Ti)/(2/co6 n )] . Then for every z £W, 



lim ] 

n— >oo 



SU P — IWU^O - - W*M Z )I ^ B ™n(^ 



-2e 



a{x) 



Observe that T n becomes denser in T as b n — > 0. Since 6 n — > 0, if the 
regression function p is sufficiently smooth, then {p(x) :x G T} can be well 
approximated by {p(x) : x G T n } for large n. Theorem 2 is useful to construct 
SCB in an approximate version of (1.3): 

(3.5) lim ¥{l(x) < p(x) < u(x) for all x G %} = 1 - a. 

n — >oo 

Specifically, let a n (x) [resp. P/j,(x)] be an estimate of a(x) [resp. p^x)} 
such that sup x6 -j- \a n (x) — cr(x)\ = o p [(logn) _1 ] and snp xe q- \p^(x) — p^(x)\ = 
o p [(nbn logn) _1//2 ]. By Slutsky's theorem, Theorem 2 still holds if a and p^ 
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therein is replaced by a n and p^, respectively. Hence, in the sense of (3.5), 
an asymptotic 100(1 — a)% SCB for p can be constructed as 



Hb n {x) - b n ip K p^{x) ± = B mn (z a ) and 

\/nb n fx(x) 

(3.6) 

z a = -loglog[(l-a)^ 1 / 2 ]. 

In (3.6), Pfj,(x) cannot be easily estimated since it involves unknown functions 
p" , p! and f' x . Following Wu and Zhao [37], we adopt the simple jackknife- 
type bias correction procedure which avoids estimating p",p' and f' x : 

(3-7) p* bn (x) = 2p bn (x) - p^ bn (x) . 

Then the bias term 0(b^) in p bn reduces to 0(b^) in . Using (3.7) is 
equivalent to using the 4th-order kernel K*{u) = 2K(u) — K(u /y/2)/y/2 in 
(3.1). Clearly, K* G JC has support [— \/2ko, y/2ko] and ipK* = 0. Let m* = 
\{T 2 - r 1 )/(2 V / 2A;o6 n )l and T* = {x* =T X + 2V2k b n j,j = 0, 1, . . . ,m* n - 1}. 
Then Theorem 2 still holds with p bn (resp. K,m n ,T n ) replaced by pi (resp. 
K*,m* n ,T*). 

In Theorem 2, (3.3) imposes conditions on the bandwidth b n and the 
strength of the dependence. The first part nb\ log n — > aims to control the 
bias with b n being not too large, while the second one (logn) 3 /(?i6^) — ► 
suggests that b n should not be too small, thus ensuring the validity of the 
moderate deviation principle (see the proof of Theorem 5). The third part 
suggests that the dependence should not be too strong. For SRD processes, 
we have H n = 0(n) and the third term in (3.3) is automatically o(l) if the 
first two are o(l). If b n x n~P with (3 € (1/9, 1/3), then the first two terms 
in (3.3) are o(l). In particular, (3.3) allows (3 = 1/5, which corresponds to 
the mean square error (MSE)-optimal bandwidth. Interestingly, (3.3) also 
allows long-range dependent processes; see Section 4.2. 

3.2. Simultaneous confidence band for a 2 . Let p,^ be as in (3.7). Since 
E(e?) = 1 and E{[Yj — p(Xi)] 2 \Xi = x] = a 2 (x), a natural residual-based 
estimator of <r 2 (x) is 



(3.8) 



1 71 
nh n fx{x) i=1 

1 n 

where fx (x) = —— 2J K hn (x-Xi). 



Here h n is another bandwidth and it can be different from the bandwidth 
b n in estimating p. Similar methods are applied in Fan and Yao [16] and 
Hall and Carroll [22]. 
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Remark 1. A referee pointed out that, as shown in Fan and Yao [16], 
it is not necessary to correct the bias in estimating the mean function \i. 
In fact, if one uses the original estimate flb n , then the bias term O(fr^) on 
the right-hand side of (7.23) is squared in (7.24) and becomes 0(&n)> which 
does not affect the subsequent proofs. 



Proposition 1. Let K e /C,£o £ £ 6 and h n >c b n . Assume that 
inf^gr fx{x) > 0, inf-zgr a(x) > and fx,^ G C 4 (T e ) for some e > 0. Fur- 
ther assume that 

h 3 r / 2 logn + 



(3.9) 
Then 



SU P I^JaO -a { x )\=Op\h n + 



1 



2 hl 



+ 



0. 



n- 



+ 



logn 
n/in. 



1/2 



+ 



logn 



n 3 h 7 n 



1/4 n l/2. 



7? 



Proposition 1 provides a uniform error bound for the estimate a\ (•). 
From the proof of Proposition 1 and Theorem 3, one can as in Theorem 
1 establish a CLT for a\ (x) for each fixed x and the optimal bandwidth 



n 



h n 

bandwidth h. 

S l/2 



- 1 / 5 . We omit the details. In Proposition 1, if one uses the optimal 



n 



-1/5 



a 2 (x 



O p [n-'V 8 (logTi) 1 /'» + 



n 



then sup a . er |<7 hn (x) 
6 / 5 ]. The first part OJn- 2 / b {\ogn) 1 / 2 } in the error bound is optimal 



in nonparametric curve estimation for independent data. The second part 
accounts for dependence, and it can be absorbed into the first one if S n = 
0(n 8 / 5 ). In particular, for SRD processes, S n = 0(n). 

Theorem 3 below presents a maximal deviation result for o\ n and it can 
be used to construct SCB for a 2 . 



Theorem 3. Let the conditions in Proposition 1 be fulfilled. Further 
assume that a S C 4 (7~ e ) for some e > and 



(3.10) 



nh n log n + 



logn „ 



hn log?* (logn) 2 



n 



n 2 h n ' 



4/3 



0. 



Let the kernel K have support [— ko,ko\; let T n = {xj =T± + 2koh n j,j = 
0, 1, . . . ,m n — 1} and m n = \ (T2 — Ti) / (2kQh n )~\ . Let B n (z) be as in (3.4). 
Then for every z £ M, 



lim ' 

n— >oo 



sup 



[/*(*)] 



1/2 



\°h n ( X ) ~ a2 ( X ) ~ hl^Kpa^l < Bfr ln {z) 
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where v £ = E(eg) — 1 > and 

p a (x) = 2a'(x) 2 + 2a(x)a"(x)+4a(x)a'(x)f' x (x)/f x (x). 



If // were known and we use a 2 in (3.8) to estimate a 2 with (1^ therein 
replaced by the true function fi, then Theorem 3 is still applicable. So The- 
orem 3 implies the oracle property that the construction of SCB for a 2 does 
not heavily rely on the estimation of [i. Under strong mixing conditions, 
Fan and Yao [16] obtained a similar oracle property for their local linear 
estimator of a 2 . For independent data, Hall and Carroll [22] considered the 
effect of estimating the mean on variance function estimation. 

As in (3.7), we propose the bias-corrected estimate cr 2 /* (x) = 2(7^ (x) — 
o 2 r- . Similarly as in Section 3.1, we can define m* and T* accordingly 
and Theorem 3 still holds with a 2 (resp. K,fh n ,'T n ) replaced by a 2 * (resp. 

3.3. Estimation of v £ in Theorem 3. To apply Theorem 3, one needs to 
estimate v £ = E(eg) — 1. Here we estimate v e by 

(3.11) ^= llX<6T 

Yi-fi* b (Xi) 
where ii = — ^ " , — , i = 1, 2, . . . ,n. 



Here are estimated residuals for model (1.1). The naive estimate n _1 J2?=i H ' 
1 does not have a good practical performance since a\ (x) behaves poorly 
if | a; | is too large. Truncation by T improves the performance. 



Proposition 2. Assume that the conditions in Proposition 1 are satis- 
fied. Then 



(3.12)^ 



Or 



n 



-1/3 



1 



nhn 2 



+ 



logn 



nh r 



1/2 



+ 



logn 



n l/4 



3 hl 



+ 



S l/2 



By Proposition 2, when one chooses the MSE-optimal bandwidths b n >c 
h n x n" 1 / 5 and assume H n = 0(n 4//3 ), then v e — v e = O p (n -1 / 3 ). By Slutsky's 
theorem, the convergence in Theorem 3 still holds when u e therein is replaced 
by z) E . It is expected that, under stronger regularity conditions, one can 
obtain the better bound v e — v e = O p (n~ p ) with p > 1/3. Here we do not 
pursue this line of direction since the involved calculation will be tedious 
and since the bound v £ — v e = O p [(log?i) _1 ] suffices for our purpose. 
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4. Examples. To apply Theorems 2 and 3, we need to deal with ^ n 
defined in (2.3). Let rf ,r)i,i G Z, be i.i.d.; let T[ = {F-\,rf Q ,rii, . . . ,rji). By- 
Theorem 1 in Wu [35] , we have 

(4.1) 0i<wi :=sup||/x(x|^) -f x (x\J*)\\ +sup||/^(x|^) - f' x (x\^)\\. 

For many processes there exist simple and easy-to-use bounds for wi. Here 
we shall consider linear processes and some popular nonlinear time series 
models. 



4.1. Short-range dependent linear processes. Let rji,i G Z, be i.i.d. with 
r/o G £ ? , q > 0, and E(r/o) = if q > 1. For real sequence (aj)j>o satisfying 
Z^o l *!^ 2 < 00 > the process 

00 

(4.2) X i = Y / a j Vi-j, 

j=0 

is well defined and stationary. Special cases of (4.2) include ARMA models. 
Assume WLOG that oq = 1. Let Xi = Xi — rji and X[ = Xi + ai(r]' — 770). 
Then fx{x\Fi-i) = f v (x - Xi) and fx{x\Fi-\) = /^(x - X-), where is 
the density of t]q. Assume that f v G C 2 (]R). Simple calculations show that 
6*, = 0(\ai\ q '), where q' = (q A 2)/2; see Proposition 2 in Zhao and Wu [39]. 
Therefore we have H n = 0(n) if X^i \ a i\ q < 00 • If Q > 2, then the latter 
condition becomes < 00. For causal ARMA processes, a, — > geo- 

metrically quickly. Note that our setting allows heavy-tailed 77^ . 

4.2. Long-range dependent linear processes. Consider the linear process 

(4.2) with Oj =i~ a £(i), where a > l/(2q'), q' = (q A2)/2, and £(•) is a slowly 
varying function. The case of aq' > 1 is covered by Section 4.1. Assume 
aq' G (1/2,1]. If q > 2 and a G (1/2,1), by Karamata's theorem, the co- 
variances E(AoA n ) are of order n 1_2a ^ 2 (n) and not summable, hence (Xi) 
is lone-range dependent. As in Section 4.1, Oi = 0[i~ aq £ q (i)}. By (2.4), 
H n = 0[n 3 - 2aq ' £ 2q ' (n)] if ag' G (1/2,1) and ~ n = 0{n[E"=i I^COI/*] 2 } if 
a</ = 1. 

If <V G (17/26,1], then (3.3) and (3.10) hold if fe n x /i n x n~P and p 
satisfies 

, . fl 2(1-<V)1 „ fl 3(2ag'-l)l 

(4.3) maxj-, 1 - *i j < < minj-, A-^ ^j. 

Since a</ G (17/26,1], such /? exists. So under (4.3), Theorems 2 and 3 are 
applicable. 

Example 1. Let a(z) = 1 - J2i=i a i zl an d = 1 + Ya=\$i z% be 
two polynomials, where a,\ , . . . , ctk, /Si, ■ • ■ , ft v G R, are real coefficients. De- 
note by £> the backward shift operator: B 3 X n = X n _j,j > 0. Consider the 
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FARIMA(M,p) process X n defined by a(B)(l - B) d X n = p(B)e n , 
d£ (-1/2,1/2). Let r(x) = J °° t x ~ l e~ l dt be the gamma function. In the 
simple case of p = k = 0, we have X n = J2i^o a i £ n-i, where 

T(n + l)r(d) 

If d G (0, 1/2), then X n is long-range dependent. More generally, it can be 
shown that (4.4) holds for general FARIMA(A;, d,p) processes if a{z) ^ for 
all complex \z\ < 1. 

4.3. Nonlinear AR models. Consider a special case of (1.1) with = 
Yi-i and r/j = £i-i- Then it becomes the nonlinear AR model 

(4.5) X i+1 = fi(Xi) + a{Xi)r) i+1 . 

Special cases of (4.5) include linear AR, ARCH, TAR and EAR processes. 
Denote by f v the density of 770. Assume 770 G C q and sup ieR (l + |a;|)[|/' (x)| + 
\f^j(x)\] < 00. As in Zhao and Wu [39], we have 9 { = 0(r*) with r G (0, 1), 
and hence S n = O(n), provided that 

inf a(x) > 0, sup[|jU (x)| + |o"'(x)|] < 00 and 

(4-6) 

sup ||// (z) +cr (z)77o|| 9 < 1. 



Example 2. Consider the ARCH model X n = r] n ^J a 2 + 6 2 A^_ 1 , where 
rji,i G Z, are i.i.d. and a, b are real parameters. If 770 G C q and |&|||?7o||q < 1, 
then (4.6) holds. 

5. A simulation study. In this section we shall present a simulation study 
for the performance of our SCB constructed in Section 3. Let Si,i G Z, be 
i.i.d. standard normal random variables. We shall consider the following two 
models: 

Model 1: AR(1) Yt = fi(Yi-i) + i = 1,2, .. 

Model 2: ARCH(l) Y i = a(Xi- 1 )e i , i = 1,2, . . . ,n. 

Here /t and cr are functions of interest and s > is the scale parameter. 
Model 1 is a nonlinear AR model and Model 2 is an ARCH model. 

The jackknife bias-correction scheme reduces bias and allows one to choose 
a relatively larger bandwidth. On the other hand, a larger bandwidth results 
in relatively fewer grid points in T n , and consequently a less accurate ap- 
proximation of {/i(x) : x G T} by {//(x) : x G T n }. In our simulation, we tried 
different bandwidths and different sets TnA,k = 20, 30 and 50, of grid points 
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to access the performance of our SCB. Here Tha denotes the set containing 
k grid points evenly spaced over T, regardless of the bandwidth. The three 
choices of have similar performance, and the results reported below are 
for T (20 ). 

To construct SCB based on Theorem 2, one can use the cutoff value q a = 
B mn (z a ), where z a = — log log [(1 — a) -1 / 2 ] is the (1 — a)th quantile of the 
limiting extreme value distribution. Since the convergence to extreme value 
distribution is quite slow, we shall propose a finite sample approximation 
scheme to compute the cutoff value. Let Zj, 1 < i < m n , be i.i.d. standard 
normal random variables and B mn (z) be as in Theorem 2. Then 

(5.1) lim p( sup \Zi\ <B mn (z)\ = e- 2e ' z , 

and the events in (3.5) and (5.1) have the same limiting distribution. So we 
propose the finite sample cutoff value defined by 

(5.2) Pj sup \Zt\ < q* a ) = 1 - a, or F{\Zi\ < q*J = (1 - a) 1 ^ . 

The difference between q a and q* a is that q* a is the (1 — a)th quantile of 
su Pi<i<m n \ f° r fi xe d m n while q a is based on the limiting distribution. 
Thus, we expect the SCB based on would outperform the one based on 

To assess the performance of our SCB, we generate 10 4 realizations from 
Model 1 with n = 2500, n(x) = 0.9sin(x) and s = 0.4. Then (4.6) holds. Un- 
der this setting, simulations show that about 92-95% of the Y's lie within 
the interval [—1.1,1.1]. Thus we take T= [—1.1,1.1]. We construct a SCB 
for each realization, and simulated coverage probabilities are the proportion 
of these 10 SCBs that cover fj,. When applying the simulation procedure, 
we adopt the following technique for fitting jl* b at Yi, < i < n: we fit 300 
grid points evenly spaced on the range of Y^s and use the fitted value of 
the nearest grid point to Yi as p,^ (Yi). Doing this allows one to gain bet- 
ter smoothness since the original series (Yi)f =Q may be irregularly spaced. 
The result is reported in Table 1. The first row corresponds to different 
choices of bandwidth b n , the second and third rows are the simulated cover- 
age probabilities of the constructed SCBs by using the cutoff values q a and 
<7* , respectively. For a = 0.05 and m n = 20, (70.05 = 3.203 and 5$ 05 = 3.016. 
The coverage probabilities using g* are relatively insensitive to the choice 
of bandwidths and very close to the nominal level 95% while the coverage 
probabilities using q a are systematically bigger. 

In Model 2, we take n = 2500, a(x) = (0.4 + 0.2X 2 ) 1 / 2 and T = [-1, 1]. 
Table 2 shows similar phenomena. 
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Table 1 

Simulated coverage probabilities for Model 1 





0.10 


0.12 


0.14 


0.15 


0.16 


0.18 


0.20 


use q a 
use g* 


0.9718 
0.9471 


0.9716 
0.9498 


0.9705 
0.9482 


0.9695 
0.9479 


0.9671 
0.9463 


0.9646 
0.9430 


0.9581 
0.9312 



6. Application to the S&P 500 Index data. The dataset contains 14374 
records, So, Si, . . . , S'14373, of S&P 500 Index daily data during the pe- 
riod 3 January 1950 to 20 February 2007. Let Y t = log(5 i+ i) - log(Si),i = 
0, 1, . . . , 14372, be the log returns (difference of the logarithm of prices). Ding, 
Granger and Engle [8] considered daily log return series Yi of S&P 500 Index 
during the period 3 January 1928 to 30 August 1991, and concluded that: 
(i) The S&P 500 returns Yi are not i.i.d.; and (ii) Y^ do have some short 
memory and there is a small amount of predictability in stock returns. See 
also Verhoeven et al. [33], Caporale and Gil-Alana [6], and Awartani and 
Corradi [2] for more discussions on the serial dependence among the S&P 
500 returns. 

Here we shall model this dataset using model (1.1) with (Xj,Y^) = 
Y),i = 1,2, ... , 14372. Since 13562 (94.4%) out of the 14373 Y's lie within 
the range [—0.017,0.017], we only keep those pairs (Xi,Y) for which Xi £ 
T:= [-0.017,0.017]. 

We now address the bandwidth selection issue. Popular bandwidth selec- 
tion methods include plug-in method, cross-validation approach and Fan and 
Gijbels's [14] residual squares criterion (RSC) method. As demonstrated by 
the latter paper, the RSC method is efficient in choosing bandwidth in local 
polynomial regression. Here we shall briefly illustrate the idea. Suppose that 
we are interested in local linear estimation of m{x) = ~K{Y\X = x) based on 
data (Xi,Y)f=i drawn from the distribution (X,Y). Let X be the design 
matrix whose ith row is (1, Xi — x), 1 < i < n, Y = (Y\, Y2, . . . , Y n ) T the re- 
sponse vector, and K = dmg{K bn (Xi - x),K bn (X 2 - x),.. .,K bn (X n - x)} 
the diagonal weighting matrix. The RSC is defined as 

1 -I- 2A 

RSC(x; b n ) = Z r^f-^( Y ~ X/3) T K(Y - X/3), 

trace{K - KXS" 1 X T K} 



Table 2 

Simulated coverage probabilities for Model 2 





0.16 


0.18 


0.20 


0.22 


0.24 


0.26 


0.28 


0.30 


use q a 
use g* 


0.9625 
0.9435 


0.9646 
0.9443 


0.9653 
0.9490 


0.9688 
0.9534 


0.9702 
0.9529 


0.9721 
0.9572 


0.9679 
0.9525 


0.9654 
0.9498 
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where (3 = S" 1 X T KY, S n = X T KX, S* = X T K 2 X and A is the first diagonal 
element of the matrix S~ 1 S*S~ 1 . The idea is to find b n to minimize the 
integrated version of the RSC: IRSC(6 n ) = j T RSC(x; b n ) dx; see Fan and 
Gijbels's paper for more details. With their method, we outline our proce- 
dure: 

(1) Define IRSC^(6 n ) for bandwidth selection in estimating [i. 

(2) Obtain 6* by minimizing IRSC^(6 n ). 

(3) Get local linear regression estimate /ij,* and /ty^fe* • 

(4) Apply bias-correction procedure and obtain the final estimate = 

(5) Compute residuals, and define IRSC CT (/i n ) for bandwidth selection in 
estimating a 2 . 

(6) Obtain h* n by minimizing IRSC CT (6 n ). 

(7) As in steps (3) and (4), calculate erf*, . 

We find b* n = 0.013 with IRSC M (6;) = 2.292 x 10~ 6 . Furthermore, we com- 
pute the relative IRSC: IRSC M (6„)/IRSC M (6* ) = 1.226,1.194,1.154,1.114, 
1.078, 1.050 for b n = 0.001, 0.002, . . . , 0.006, respectively, and the IRSC curve 
tends to be flat after b n = 0.005. To avoid over-smoothing and yet control 
IRSC, we choose b n = 0.005 as our final bandwidth. Similarly, we choose 
h n = 0.006 with the relative IRSC being 1.063. We use the cutoff value g* 
and the grid points T (30) = {-0.017 + 0.034j'/29 :j = 0, 1, . . . ,29} described 
in Section 5 to construct SCB for //(•) and er 2 (-) in model (1.1). The es- 
timated u £ = 6.61. We have also tried various choices of bandwidths and 
obtained very similar results. 

Interestingly, the 95% SCB for fi and a 2 in Figure 1 suggest that we 
can accept the two null hypotheses that the regression function /i(-) is 
linear and that the volatility function cr 2 (-) is quadratic. The fitted lin- 
ear equation is /timear^) = 0.00022 + 0.138x and the fitted quadratic curve 
is ^quadratic( x ) = 0.000058 - O.OOllx + 0.257x 2 . Moreover, since the fitted 
constant line (solid) ^constant ( x ) = 0.000068 in Figure 1 for a 2 {-) is not en- 
tirely contained within the 95% SCB, we claim the existence of conditional 
heteroscedasticity. Finally, we conclude that the following AR(1)-ARCH(1) 
model is an adequate fit for the S&P 500 Index data: 

(6.1) Yi = 0.00022 + 0.138^.1 + £^0.000058 - 0.0011^-1 + 0.2571^. 

The volatility function in (6.1) is slightly different from that of Engle's [10] 
ARCH model Yi = {a 2 + b 2 Y 2 _ l ) l / 2 e l . The latter one implies symmetric con- 
ditional volatility while the former one allows for asymmetric volatility: The 
negative coefficient —0.0011 of Y^_i in (6.1) is in line with the empirical ob- 
servation that bad news (negative return) causes bigger volatility than good 
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1 1 1 1 1 1 1 o 1 1 1 1 1 1 1 — 

-0.015 -0.005 0.000 0.005 0.010 0.015 g -0.015 -CODS 0.000 0.005 0.010 0.015 



Log returns Log returns 

Fig. 1. Left: The 95% SCB for regression function [i; dotted, long-dashed and solid lines 
are the estimated curve p% , SCB and the fitted linear line fiu n ea,r{x) = 0.00022 + 0.13822, 
respectively. Right: The 95% SCB for volatility function a 2 ; dotted, long-dashed, 
solid quadratic and solid constant lines are the estimated curve a h * n , SCB, the fit- 
ted quadratic curve d-^ uadTatic (x) = 0.000058 — 0.0011a; + 0.2567a; 2 , and the constant fit 
Constant (^) =0.000068, respectively. 



news (positive return). The linear term in Model (6.1) suggests some amount 
of predictability in returns. While it is widely accepted that financial market 
is efficient and consequently stock prices are not predictable, there is still an 
increasing number of works that question such a theory. See Ding, Granger 
and Engle [8], Fama and French [12], Gropp [20] and references therein. On 
the other hand, since the volatility term 

^0.000058 - 0.0011Yi_i + 0.257^ > 0.5|lS_i| 

dominates the linear term, the predictability is fairly weak relative to the 
noise level. 

7. Proofs. This section provides proofs of results stated in Section 3. Re- 
call J r i = (.. .,rn-x,r)i). Let Qi = (. . . , rji, rji+r, e», £i_i, . . .). By the assumption 
in Section 2, £j is independent of Qi~\- In the sequel, with a slight abuse 
of notation we refer Ti (resp. Qi) as the sigma field generated by Ti (resp. 
Qi). Recall (2.3) for S n . Throughout the proofs we assume WLOG that the 
kernel K has bounded support [—1,1]. 
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Recall that fx(x\J r i-i) is the conditional density of Xj at x given 
Define 

n 

(7.1) I n (x) =Y,\Jx{x\Fi-i) ~ x G R. 

i=l 

1/2 

Lemma 1. Let T > be fixed. Then \\sup\ x \ <T \I n (x)\\\ = 0(En ). 

PROOF. By Theorem 1 inWu [36], sup x6K [||/ n (x)|| + ||4(x)||] = O^H 2 ). 
Then Lemma 1 easily follows since supi z i<y \L n (x) — I n (—T)\ < \dl n (x)/ 
dx\ dx. □ 

7.1. A CLT and a maximal deviation result. Let g and be measurable 
functions such that h(eo) G £ 2 and i?| = Var[/i(eo)] > 0. For K eK define 

n 

(7-2) 

where &(*) = gW^») = E^fo))]^ - gj) 
tf/ifKz) \fnb n tp K fx{x) 

In Theorems 4 and 5 below, we shall establish a central limit theorem and 
a maximal deviation result for S n (x), respectively. These results are of in- 
dependent interest and they are essential to the proofs of our main results 
in Section 3. 

Theorem 4. Let x G R be fixed, K G K. and h(eo) G C? . Assume that 
fx(x) > 0, g(x) 7^ 0, and fx, g G C°({x} £ ) for some e > 0. Further assume 
that b n — > 0, nb n — ► oo and H n /n 2 — > 0. T/ien S n (x) => N(0, 1). 

Proof. Since £j is independent of {^(x)}" =1 form martingale 

differences with respect to Qi. By the martingale central limit theorem, 
it suffices to verify the convergence of conditional variance and the Lin- 
deberg condition. Let 7$ = g 2 {Xi)K% (x — Xi), Ui = 7« — E(7j|^i_i) and 
«i = E(7i|^_i)-E(7i). Write 

n n n 

(7.3) Y,h ~ E (7i)] = M n + i? n , where M n = ^ ^ and R n = Y, v i- 

i=l i=l i=l 

Hereafter we shall call (7.3) the M/i?-decomposition. Since {wj}™ =1 are mar- 
tingale differences with respect to Ti and E(u 2 ) = 0(b n ), we have M n = 
O v {yJnb n ). Recall (7.1) for L n (x). Since K is bounded and has bounded 
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I Rn 1 1 — b n 



K 2 (u)g 2 (x — ub n )I n (x — ub n ) du 



(7.4) 



Since b n — > 0, nb n — > oo and H n /n — > 0, simple calculations show that 



<b n / K 2 (u)g 2 (x - ub n )\\I n (x - ub n )\\du = 0(E 1 r / 2 b n ). 



(7.5) ^E^x)^] 



+ 



Er=iE( 7i ) 



i=l 



nb n ip K fx{x)g 2 (x) nb n ip K fx(x)g 2 (x 



1. 



There exists c such that sup u \g{u)Ki n (x — u)\ < c holds for all sufficiently 
large n. Let A = "&hg{x)[({)K fxix)} 1 / 2 and h(eo) = h(eo) — E[/i(eo)]- For any 
s > 0, by the independence of Xq and £q, 



£E[£(s)l|f i(!B )|>. 
1 



i=l 



< 



^-E[ 5 (Xo)if 6n (x-X )/t (eo)l| fl(Xo )^ n(:c -Xo)ft( £ o)| 
1 



> \s\/rib n 



E[g 2 (X )K 2 n (x-X )h z (e )l 



\h{eo)\>\s\Jnb n /c\ 



\ 2 h 



-E[9 2 (X )K 2 (x-X )} x E[P(e )l 



|/i(eo)|>As'\/n6n/cJ 



in view of nb n — > oo and h(eo) £ C 2 . So the Lindeberg condition holds. □ 

Recall Theorem 2 for the definitions of m n and T n . Let S n (x) be as in (7.2). 
Theorem 5 below provides a maximal deviation result for sup^g^ |5 n (a;)|. 
Results of this type are essential to the construction of SCB (cf. Bickel 
and Rosenblatt [4], Johnston [26] and Eubank and Speckman [11] among 
others). To obtain a maximal deviation result under dependence, we shall 
apply Grama and Haeusler's [19] martingale moderate deviation theorem. 

Theorem 5. Let K G K, and h(eo) G C 3 . Assume inf u6 r/x(«) > 0, 
g(x) /0,x6T, and fx, g £ C 4 (T e ) for some e > 0. Further assume that 

(7 6) 6 4 / 3 logn+ (1 ° gn)3 + ^ (logn)2 ^0 

Let B n (z) and T n be as in Theorem 2. Then 



(7.7) lim P<{ sup \S n {x)\ <B mn (z) 



-2e~ 



z G . 
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Proof. Recall (7.2) for £i(x). For fixed k € N and mutually different 
integers < j\ , j2, ■ ■ ■ , jk < uin , let the fc-dimensional vector Q = [£j (x^ ) , 

C«( X J2)' • • • ) £i(. x jk)] T an< ^ ^n,k = J2i=lCi = [Sn{ x j 1 ),S n (Xj 2 ),...,S n (xj k )] T . 

Here T denotes transpose. Then {CiliLi are ^-dimensional martingale differ- 
ences with respect to Q{. Let Q denote the quadratic characteristic matrix 
of S n h, namely, 



(7.8) Q = ^2'E((i(f\Q i ^i) := {Qrr')l<r,r'<k- 

i=l 

Let r rr / = cpKg(x jr )g(x jr ,)[fx(x jr )fx(xj r ,)] 1/2 and write 

n 

Qrr> =J2~^l&(Xj r )£i(Xj r ,)\Gi~l] 



Y,9 2 {Xi)K K {x jr - XjK^x^ - X t ). 



i=l 



For r / r' , since \xj r — Xj , | > 2b n and K has support [—1, 1], Q rr i = 0. For 
r = r' , we use the M/i?-decomposition technique in (7.3). Define 

a t (r)=g 2 (X l )K 2 n (x Jr -X l )-E[g 2 (X l )K 2 n (x Jr -X l )\^ 1 ], 

(3 l (r)=E[g 2 (X l )Kl(x Jr -X l )\^ 1 ]-E[g\X l )Kl i (x 3r -X l )l 

Since {oii{r)}™ =l form martingale differences with respect to J~i, we have 



(7.9) 



i=l 



n l/2 



i=l 



0(\/n6n), 



uniformly over r. By Schwarz's inequality and Lemma 1, as in (7.4), we have 



i=l 



(7.10) 



K 2 (u)g 2 (x jr - ub n )I n (x jr - ub n ) du 
< b n I K 2 {u)g 2 (x jr - ub n )\\I n (x jr - ub n )\\ 2 du = 0{~}J 2 b n ), 

JM. 

uniformly over r. Since fx, 9 G C 4 (T e ) and K G /C, by Taylor's expansion, 



(7.11) 



E% 2 (^^ 2 >> " " nb n r r 



0(nbl). 
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(7.12) 



n 
i=l 



3/2 



3/2 



+ 



i=l 



3/2- 



0(5, 



uniformly over r. Let Ij. = diag(l, 1, . . . , 1) = (it rr ')i<r r'<fc be the A; x fc iden- 
tity matrix. Then E|Q rr / — i lrr ,\ 3 / 2 = 0(Sl /2 ), uniformly over 1 < r, r' < k. It 
is easily seen that X^i^l&O^jv)! 3 = 0[{nb n )~ l l 2 } uniformly over 1 < r < k. 



Then j:um^jr)\ s +nQ 



,| 3/2 = 0(A n ), where A n = (n^)" 1 / 2 + 



Under (7.6), elementary calculations show that [l + S fn?i (z)] 4 exp[i? 2 aji (z)/2] x 
A n — ► for fixed z. Let Aj denote the event {|5 n (xj)| > B mn (z)} and E mn = 
Uj^b A?> let Ni,N2, . . . , be i.i.d. stand normals. By Theorem 1 in Grama and 
Haeusler [19], 



(7.13) 



■ k 

T=l 



f]{\N r \>B mn (z)} 



Lr=l 



2e -z\ k 



m r 



in view of P(A r i > x) = [1 + o(l)]c/>(x)/x as i-> oo, where is the stan- 
dard normal density function. Notice that Pjsup^g^ |S n (x)| > B mn (z)} = 
F(E mn ). By the inclusion-exclusion inequality, we have, for large enough n, 

m„ 

P(^ m J<E p [^']- E %n4] + - 

i=0 h<3"2<m n 



+ 



(7.14) 



2fc-l 

n ^ 



E(-!) 5 



E 

i / m n + 1 \ / 2e 



h<h<--<hk-\<m n Lr=l 

2fe-l 



r 



r=l 

2fc-l /_9 -z\r 

-E^r 1 ^^- 

r=l 



?72r 



[1 + °(1)] 



Thus, let Tj = YH=i [-2exp(-z)] r /r!, we have limsup^^ F(E mn ) < -r 2 k-i- 
Similarly, liminf^oo P(£ m J > -r 2 fc. Since lim^ooT/c = l-exp[-2exp(-z)], 
(7.7) follows. □ 
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7.2. Proof of the results in Section 3. Before proving Theorems 1 and 2, 
we first note that, by the definition of p bn ( x ) an d fx{x) in (3.1), 



(7.15) 



/nbZ 



where u n (x) = fx( x )/fx( x ) 

(7.16) U n (x) 

(7.17) V n {x) 



fx(x){fi bn (x) - p(x) -u n (x) U n (x)} = V n (x)Joj n (x), 



nb n f x (x) fr{ 
1 



Y^K^x-XiMXi)-^)], 



^o(Xi)eiK bn {x - Xi 



o-(x)y/nb n ip K fx(x) ~{ 

In (7.15), we can view U n (x) [resp. V^(x)] as the bias (resp. stochastic) part 
of p bn (x) ~ p( x )- The stochastic part V n (x) is treated in Proposition 4 and 
Theorem 5. The following Lemma 2 concerns oj n {x) and the bias part U n {x). 

Lemma 2. LetKeJC. 
(i) Recall the definition of fx( x ) ^ n 

(3.1). Assume that fx £ C 4 (T £ ) for 

some e > 0. Then 



(7.18) 



sup \fx(x) - fx( x )\ = O p {q n ), 



where q n = J log n/(nb n ) + b\ + r^ 2 / 



n. 



(ii) Recall the definition of U n (x) in (7.16). Let p^ix) be as in Theorem 
1. Assume that fx,P G C 4 (T e ) /or some e > and mi x ^q- fx(x) > 0. T/ien 



sup \U n {x) - b n 1pKPfj,{x) \ = Op On), 



where r n = A log n/n + o 4 + z}J 2 b n /n. 



(7.19) 



(iii) Lei g,h be measurable functions such that h(eo) G £ 9 /or some q > 2 
and <? 6 C°(T e ) /or some e > 0. Then 



1 



sup 



(7.20) 



zeT nh n 



Y,K bn {x-X l )g{X l )[h{e i )-m(e l )} 



Op[Xn(q)}, 



where X n{q) = y/\ogn/(nb n ) + re-^&^-^logre) 9 / 4 " 1 / 2 . 



Proof, (i) We shall use the M/i?-decomposition technique in (7.3). By 
the chain argument in Lemma 4 in Zhao and Wu [39], we can show that 



(7.21) sup 



J2{K bn (x - Xi) - E[K bn (x - X0|^i-i]} 



i=i 



Opi^/nbn log re). 
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By Lemma 1, since K has bounded support, we have 

n 

Y,{nK bn ( X - xoi^i-i] - nK bn (x - x t )]} 

(7-22) 

= b n K{u)f x {x - ub n )I n {x - ub n )du = O v {r}J 2 b n ). 
Jr 

Since K G /C, by Taylor's expansion, X)r=i^[^n( a; ~~ -^i)] ~~ n b n fx(x) = 
0(nbl). Therefore, (i) follows from (7.21) and (7.22). 

(ii) Let 7i(x) = K bn {x — Xi)[fi(Xi) — /j,(x)]. Again we use the M/R-decom- 
position technique in (7.3). By the chain argument in Lemma 4 in Zhao 
and Wu [39], we can show that sup xg <2- 1 Yl?=i{li(, x ) ~~ E^^x)!^-!]}) = 
O p [(n^logn) -1 /2]. As in (7.22) , we can show that sup^g-j- 1 Y^=i\^\li{ x )\^i-i\ ~ 
E[7((»)]}| = 0(En b%). Since K G /C, elementary calculations show that 
Efc=iE[7i(a;)] = nblf x (x)ip K p^(x) + 0(nb\). So (ii) follows. 

(hi) We shall only consider the special case of h{u) = u since other cases 
follow similarly. Let c n = (nbn/logn) 1 ^ 2 and define 

n 
i=l 

where di(x) = K bn (x 

n 

D n (x) = Y,di(x), 

i=l 

where d^x) = K K (x - Xi)g(Xi)[eil\ £i \< Cn -E(e i l\ Ei \< Cn )]/c n . 

Note that, for each fixed x, {di(x)}f =1 and {dj(x)}f =1 form martingale differ- 
ences with respect to Qi, and E(s$l\ £ .\ >Cn ) < E(|ej| 9 )/c^~ 2 = 0(c 2 ~ 9 ) for q > 
2. Simple calculations show that ||D n (a;)|| 2 = J2?=i \\d n {x)\\ 2 = 0(nb n c 2 l ~ q ) 
and \\dD n (x)/dx\\ 2 = 0(nc 2_9 /6 n ), uniformly over x G T. Since sup a . g7 - \D n (x) — 
D n (Ti)\ < Jq-\dD n (u)/du\du, by Schwarz's inequality, we have 
Efsup^g-j |D n (x)| 2 ] = 0(nd^ q /b n ). Since {^(^I^Li uniformly bounded 
martingale differences, by the argument in the proof of Lemma 4 in Zhao 
and Wu [39], sup xer \D n (x)\ = O p [(nb n log n) 1 / 2 /^]. Note that E(e 4 ) = 0. 
Then | EiLi - X t )g(X t )ei\ < \D n (x)\ + c n \D n (x)\. So (hi) follows. □ 

Proof of Theorems 1 and 2. Recall the definition of u n (x), U n (x) 
and V n (x) in (7.15). Applying the M/i2-decomposition technique in (7.3), 

we can show that, for fixed x, fx{x) — fx(x) = O p [(n6 n )~ 1//2 + b 2 n + Hn /n] 

and U n (x) - b^tpxp^x) = O p [(b n /n) 1 / 2 + b 2 n + El /2 b n /n]. Then w n (x) = 

1 + O v [{nb n y 1 / 2 + b 2 n + E^/n]. By Theorem 4, V n (x) => N(0, 1). Under 
condition (3.2), Theorem 1 then follows from Slutsky's theorem. 



- Xi)g(Xi)[eil\ ei \ >Cn - E(£il| e .| >Cn )], 
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By Lemma 2, sup^g-^ \ui n {x) — 1| = O p (q n ) and sup x67 - |£/ n (x) — b^ipx x 
=O p (r n ), where q n and r n are as in (7.18) and (7.19), respectively. 
Under condition (3.3), simple calculations show that q n logn+ (ra& n logn) 1 / 2 r 5 
and that (7.6) holds. So Theorem 2 follows from the decomposition (7.15) 
in view of Slutsky's theorem and Theorem 5. □ 



Proof of Proposition 1 and Theorem 3. Let q n , r n and Xn(q) be as 
in (7.18), (7.19) and (7.20) in Lemma 2, respectively. Accordingly, we define 
q n ,r n and Xn(q) with b n in q n ,r n and Xn(q) replaced by h n . For example, 
q n = [logn/ {nhn)] 1 / 2 + h 2 l + r^J 2 /n. Recall the definition of uj u (x) and U n (x) 
in (7.15). Define 

Wn{x) = £ MXMK t{ x-X^ w , (x) = £ ,(X,)s,K Ul - X.) 



i=l 



i=l 



nb n f x (x) 



where K*(u) = 2K(u)- K(u/V2)/V2. Applying Lemma 2(iii) with h(u) = u 
and q = 6, we have sup^g^- |W*(x)| = O p [xn(6)]. So, by Lemma 2(i) and (ii), 
elementary calculations show that, uniformly over x G T, 



(7.23) 



hn( X ) ~ K X ) = U n (x)U n (x) + U n (x)W n {x) 

= bl^Kp^x) + W n {x) + O p (A n ), 



where A n = r n + q n [b 2 + Xn(6)]. Consequently, (i* bn {x) = fi(x) + W*(x) + 
O p (A n ). Let fx be as in (3.8). By definition, 



(7.24) 



where 



= 7 1 , j y^Yi - WZ(Xi) + O p (A n )] 2 K hn (x - X t 



T n (x) 



nh n fx{x) 



+ 0, 



Ln{x) +^J n (x) + X l(6) + Al 



nh r 



nh r 



T n {x)=Y J ^\X l )e 2 l K hn (x-X i ), 

i=l 

n 

L n (x) =Y J °(X i )e i W*{X i )K hn {x - X { 



i=i 



and 



J n {x) = Y j a{X i )\e i \K hn {x-X i ). 
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By the argument in Lemma 2(i), we can show that Ya=i a (Xi)Kh n (x — X^) = 
O p [nh n (l + q n )], uniformly over x £ T. By Lemma 2(iii) with h(u) = \u\ and 
(7 = 6, we have uniformly over x G T that 



J n (x)=Y i a(X i )[\e i \-E\e i \]K hn (x-X i ) 



i=l 



(7.25) 



+ E\e \J2^(Xi)K hn (x-X i 



i=l 



= O p {nh n [l + q n + Xn(6)]}. 
Since £{ is independent of Q%-\ and E(ej) = 0, simple calculation shows that 

a(Xi)a(Xj) 



\\L n (x)r=E 



E 

■*J=1 



nb n fx(Xi) 



■KMXi-XjK^x-Xi) 



= 0(h n /b 2 n ), 

uniformly over x G T. Likewise, ||<9L n (a;)/cte|| 2 = 0[l/(/i n 6 2 )]. Since 
sup^g-;- |L n (x) — L n (Ti)| < J T \dL n (u)/du\du, by Schwarz's inequality, 

supa.gr |L„(z)| = Opfl/^y 2 ^)]. Recall that E(e§) = 1. Write 

T n (x) 2 + (x) 

a (x) - 



(7.26) 
where 



nh n f x (x) 



nhj x (x) ' 



D„(x) = X> 2 (^) - a 2 (x)]^ re (x - Xi), 
i=i 

n 

E n (x) = ^a 2 (^)[e 2 - E(e 2 )]^„(x - X 4 ). 



i=l 



Let tDn(x) = fx(x)/fx(x). By Lemma 2(i), sup x . eT |u) n (x) - 1| = O p (q n ). Let 
Pa(x) be as in Theorem 3. As in Lemma 2(h), we can show that 

D n (x) 



(7.27) 



sup 



h n 1p K Pa(x) 



nh n f x {x) 

Thus, by (7.24), (7.25), (7.26) and (7.27), we have 

E n (x) 



Op(r n ). 



(7.28) 



nh n f x (x) 



+ O p (£ n ), 



where £ n = {nh z J 2 b n )~ l + r n + [1 + q n + Xn(6)]A n + X 2 (6) + A 2 . 

Applying Lemma 2(iii) with h(x) = x 2 and q = 3, we have sup^g-j- |i£ n (x)| = 
O p [n/i n Xn(3)]. When h n x 6 n and condition (3.9) is satisfied, Proposition 
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1 follows from (7.28) by simplifying t n + + x n (3). Since there are no 
essential difficulties, we omit the details. 

For the proof of Theorem 3, let q n (x) = a 2 (x) / al{x) . Since E(eo) = 0, 
E(£q) = 1 and Eq has continuous density, we have v e = E(e 4 ) — 1 > 0. By 
(7.28), 

VnKi [fx(x)} 1/2 r, 2 ( \ 2( \ h 2 I f \\ 



(7.29) 

= q n {x)J[o n {x) + O p ( v / ^K£ n ). 

v a z (x)^nh n v £ ip K fx{x) 

By Proposition 1 and (3.10), sup^g^- |? n (x) — 1| = o p (l/logre). Also, it is 
easy to check that sup^g-j- \oj n {x) — 1| = o p (l/logn) and (nh n \ogn) l / 2 l n — > 0. 
Thus, Theorem 3 follows from Theorem 5 via Slutsky's theorem. □ 

Proof of Proposition 2. As shown in the proof of Proposition 1, we 
have sup xGT \ft* bn -fi(x)\ = O p [A n +x n (6)]. By (7.28), we have sup x g r \af^{x) ■ 
a 2 (x)\ =O p [x n (3) +£ n }. Therefore, 



i=l i=l 



(7.30) = £ 4lx,eT + 0{n[x„(3) + x„(6) + £ n + A n ]}. 



i=l 

-4 TP/V^ 



By the independence of £j and Gi-i, — E(£ 4 )]lx;eT}f=i form martingale 
differences with respect to Qi. Since 6 £ 6 , || J2i=i[ £ i ~ ^( £ t )]lXieT II3/2 = 
0(n 2 / 3 ). Furthermore, by applying the M/i?-decomposition technique in 
(7.3), we can show that 



£[l Xje r-E(l Xi6 r)] 

i=l 

n n 

(7.31) =^[lx lG r-E(l Xi g r |^_ 1 )]+^[E(l^gr|^-i)-E(l Xi g r )] 

i=l i=l 

=o p (v^+Hy 2 ). 

Thus, the desired result follows from (7.30) and (7.31) via elementary ma- 
nipulations. □ 
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